An Empirical Comparison of Discretization Methods

نویسندگان

  • Dan Ventura
  • Tony R. Martinez
چکیده

Many machine learning and neurally inspired algorithms are limited, at least in their pure form, to working with nominal data. However, for many real-world problems, some provision must be made to support processing of continuously valued data. This paper presents empirical results obtained by using six different discretization methods as preprocessors to three different supervised learners on several real-world problems. No discretization technique clearly outperforms the others. Also, discretization as a preprocessing step is in many cases found to be inferior to direct handling of continuously valued data. These results suggest that machine learning algorithms should be designed to directly handle continuously valued data rather than relying on preprocessing or ad hoc techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error-Based and Entropy-Based Discretization of Continuous Features

We present a comparison of error-based and entropybased methods for discretization of continuous features. Our study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion. We present a discretization method based on the C4.5 decision tree algorithm and compare it to an existing entropy-based ...

متن کامل

Comparison of different empirical methods for estimating ddaily reference evapotranspiration in the humid cold climate (case study: Borujen, Shahrekord, Koohrang and Lordegan)

The proposed method for calculation of potential evapotranspiration is Penman-Monteith FAO method, but there are other methods that require less meteorological data but estimates close to the FAO Penman-Monteith method in different climatic conditions.  Performance evaluation of these methods on the same basis is prerequisite for selecting an alternative approach in accordance with available da...

متن کامل

An Empirical Comparison between Grade of Membership and Principal Component Analysis

t is the purpose of this paper to contribute to the discussion initiated byWachter about the parallelism between principal component (PC) and atypological grade of membership (GoM) analysis. The author testedempirically the close relationship between both analysis in a lowdimensional framework comprising up to nine dichotomous variables and twotypologies. Our contribution to the subject is also...

متن کامل

Proposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method

Many classification algorithms are designed to work with datasets that contain only discrete attributes. Discretization is the process of converting the continuous attributes of the dataset into discrete ones in order to apply some classification algorithm. In this paper we first review previous work in discretization, then we propose a new discretization method based on a distance proposed by ...

متن کامل

An Evolutionary Multi-objective Discretization based on Normalized Cut

Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995